Solution: Use NULL as a Unique Value
Let’s see how using NULL as a unique value is the best solution.
Most problems with NULL values are based on a common misunderstanding of the behavior of SQL’s three-valued logic. This can be a challenge for programmers accustomed to the conventional true/false logic implemented in most other languages. But we can easily handle NULL values in SQL queries after a little study of how they work.
NULL in scalar expressions#
Let’s suppose Stan is thirty years old while Oliver’s age is unknown. If you were asked whether Stan is older than Oliver, your only possible answer would be “I don’t know.” If you were asked whether Stan is the same age as Oliver, your answer would still be “I don’t know.” And if you were asked to tell the sum of Stan and Oliver’s ages, your answer would be just the same: “I don’t know.”
Are NULLs Rational?
Now, let’s suppose that Charlie’s age is also unknown. If you were asked whether Oliver’s age is equal to Charlie’s age, your answer would still be “I don’t know.” This shows why the result of a comparison like NULL = NULL
is also NULL.
The following table describes some cases where programmers expect one result but get something different.
Expression | Expected | Actual | Reason |
---|---|---|---|
NULL = 0 |
TRUE | NULL | Null is not zero. |
NULL = 12345 |
FALSE | NULL | Unknown if the unspecified value is equal to a given value. |
NULL <> 12345 |
TRUE | NULL | Also unknown if it’s unequal. |
NULL + 12345 |
12345 | NULL | Null is not zero. |
NULL || 'string' |
‘string’ | NULL | Null is not an empty string. |
NULL = NULL |
TRUE | NULL | Unknown if one unspecified value is the same as another. |
NULL <> NULL |
FALSE | NULL | Also unknown if they’re different. |
Of course, these examples apply not only when using the NULL keyword but also to any column or expression whose value is null.
NULL in boolean expressions#
NULL is neither true nor false. A null value certainly isn’t true, but it isn’t the same as false either. If it were, then applying NOT to a NULL value would result in true. But that’s not the way it works; NOT (NULL)
results in another NULL. This confuses some people who try to use boolean expressions with NULL.
The table “NULL in Boolean Expressions” describes some cases where programmers expect one result but get something different.
Expression | Expected | Actual | Reason |
---|---|---|---|
NULL AND TRUE |
FALSE | NULL | Null is not false. |
NULL AND FALSE |
FALSE | FALSE | Any truth value AND FALSE is false. |
NULL OR FALSE |
FALSE | NULL | Null is not false. |
NULL OR TRUE |
TRUE | TRUE | Any truth value OR TRUE is true. |
NOT (NULL) |
TRUE | NULL | Null is not false. |
The Right Result for the Wrong Reason
Searching for NULL#
Since neither equality nor inequality returns true when comparing one value to a NULL value, we need some other operation to search for a NULL value. Older SQL standards define the IS NULL
predicate, which returns true if its single operand is NULL. The opposite, IS NOT NULL
, returns false if its operand is NULL.
Let’s look for the Bugs
table values where the value for assigned_to
is NULL. Let’s see what happens if we press “RUN” in the following playground.
Now, let’s check the same thing for NOT NULL
in the following playground.
The SQL-99 standard defines another comparison predicate, IS DISTINCT FROM
. This works like an ordinary inequality operator <>
, except that it always returns true or false, even when its operands are null. Unfortunately, this predicate is not available in any MySQL versions now. It’s true that it used to relieve us from writing tedious expressions to test IS NULL
before comparing to a value. The following two queries are equivalent; let’s check them one by one. The first query is using IS NULL
to check for entries having NULL values.
The following query uses <>
to search for NULL in the column assigned_to
.
We can use this predicate with query parameters to which we want to send either a literal value like 1 or NULL:
Try replacing the question mark with some literal value first, like the literal 1, to see the processing of the query in the playground below.
Support for IS DISTINCT FROM
is inconsistent among database brands. PostgreSQL, IBM DB2, and Firebird do support it, whereas Oracle and Microsoft SQL Server don’t support it yet. MySQL offers a proprietary operator <=>
that works like IS NOT DISTINCT FROM
.
Declare columns NOT NULL
#
It’s recommended to declare a NOT NULL
constraint on a column for which a NULL would break a policy in our application or otherwise be nonsensical. It’s better to allow the database to enforce constraints uniformly rather than rely on application code.
For example, it’s reasonable to assume that any entry for the date_reported
, reported_by
, and status
columns in the Bugs
table would have a non-NULL value. Likewise, rows in child tables like Comments
must also include non-NULL bug_id
values, as they reference existing bugs. We should declare these columns with the NOT NULL
option.
Some people recommend that we define a DEFAULT
for every column so that if we omit the column in an INSERT
statement, the column gets the DEFAULT
value instead of null. That’s good advice for some columns but not for other columns. For example, Bugs.reported_by
should not be NULL But what default, if any, should we declare for this column? It’s valid and common for a column to need a NOT NULL
constraint and yet have no logical default value.
Dynamic defaults#
In some queries, we may need to force a column or expression to be non-NULL for the sake of simplifying the query logic, but we don’t want its value to be stored. What we need is a way to set a default value for a given column or expression but only in an ad hoc way for specific queries. For this, we can use the COALESCE()
function. This function accepts a variable number of arguments and returns its first non-NULL argument.
Returning to the story about concatenating users’ names in the beginning of this chapter, you could use COALESCE()
to make an expression that uses a single space in place of the middle initial, so a NULL-valued middle initial doesn’t make the whole expression become NULL.
COALESCE()
is a standard SQL function. Some database brands support a similar function by another name, such as NVL()
or ISNULL()
.